.. _`K-means Clustering`:

.. _`org.sysess.sympathy.machinelearning.k_means`:

K-means Clustering
``````````````````

.. image:: dataset_blobs.svg
   :width: 48


Clusters data by trying to separate samples in n groups of equal variance


Documentation
:::::::::::::

Attributes
==========

    **cluster_centers_**
        Coordinates of cluster centers. If the algorithm stops before fully
        converging (see ``tol`` and ``max_iter``), these will not be
        consistent with ``labels_``.


    **inertia_**
        Sum of squared distances of samples to their closest cluster center,
        weighted by the sample weights if provided.


    **labels_**
        Labels of each point


Definition
::::::::::


Output ports
============

    **model**  model
        Model


Configuration
=============

    **K-means algorithm** (algorithm)
        K-means algorithm to use. The classical EM-style algorithm is "full".
        The "elkan" variation is more efficient on data with well-defined
        clusters, by using the triangle inequality. However it's more memory
        intensive due to the allocation of an extra array of shape
        (n_samples, n_clusters).

        For now "auto" (kept for backward compatibility) chooses "elkan" but it
        might change in the future for a better heuristic.

        .. versionchanged:: 0.18
            Added Elkan algorithm
    **Initialization method** (init)
        Method for initialization:

        'k-means++' : selects initial cluster centers for k-mean
        clustering in a smart way to speed up convergence. See section
        Notes in k_init for more details.

        'random': choose `n_clusters` observations (rows) at random from data
        for the initial centroids.

        If an array is passed, it should be of shape (n_clusters, n_features)
        and gives the initial centers.

        If a callable is passed, it should take arguments X, n_clusters and a
        random state and return an initialization.
    **Maximum number of iterations** (max_iter)
        Maximum number of iterations of the k-means algorithm for a
        single run.
    **Number of clusters/centroids** (n_clusters)
        The number of clusters to form as well as the number of
        centroids to generate.
    **Number of runs** (n_init)
        Number of time the k-means algorithm will be run with different
        centroid seeds. The final results will be the best output of
        n_init consecutive runs in terms of inertia.
    **Random seed** (random_state)
        Determines random number generation for centroid initialization. Use
        an int to make the randomness deterministic.
        See random_state.
    **Tolerance** (tol)
        Relative tolerance with regards to Frobenius norm of the difference
        in the cluster centers of two consecutive iterations to declare
        convergence.


Implementation
==============

.. automodule:: node_clustering
    :noindex:

.. class:: KMeansClustering
    :noindex: